153 research outputs found

    Partial bisulfite conversion for unique template sequencing

    Get PDF
    We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone

    Low load for disruptive mutations in autism genes and their biased transmission

    Get PDF
    We previously computed that genes with de novo (DN) likely gene-disruptive (LGD) mutations in children with autism spectrum disorders (ASD) have high vulnerability: disruptive mutations in many of these genes, the vulnerable autism genes, will have a high likelihood of resulting in ASD. Because individuals with ASD have lower fecundity, such mutations in autism genes would be under strong negative selection pressure. An immediate prediction is that these genes will have a lower LGD load than typical genes in the human gene pool. We confirm this hypothesis in an explicit test by measuring the load of disruptive mutations in whole-exome sequence databases from two cohorts. We use information about mutational load to show that lower and higher intelligence quotients (IQ) affected individuals can be distinguished by the mutational load in their respective gene targets, as well as to help prioritize gene targets by their likelihood of being autism genes. Moreover, we demonstrate that transmission of rare disruptions in genes with a lower LGD load occurs more often to affected offspring; we show transmission originates most often from the mother, and transmission of such variants is seen more often in offspring with lower IQ. A surprising proportion of transmission of these rare events comes from genes expressed in the embryonic brain that show sharply reduced expression shortly after birth

    SMASH, a fragmentation and sequencing method for genomic copy number analysis

    Get PDF
    Copy number variants (CNVs) underlie a significant amount of genetic diversity and disease. CNVs can be detected by a number of means, including chromosomal microarray analysis (CMA) and whole-genome sequencing (WGS), but these approaches suffer from either limited resolution (CMA) or are highly expensive for routine screening (both CMA and WGS). As an alternative, we have developed a next-generation sequencing-based method for CNV analysis termed SMASH, for short multiply aggregated sequence homologies. SMASH utilizes random fragmentation of input genomic DNA to create chimeric sequence reads, from which multiple mappable tags can be parsed using maximal almost-unique matches (MAMs). The SMASH tags are then binned and segmented, generating a profile of genomic copy number at the desired resolution. Because fewer reads are necessary relative to WGS to give accurate CNV data, SMASH libraries can be highly multiplexed, allowing large numbers of individuals to be analyzed at low cost. Increased genomic resolution can be achieved by sequencing to higher depth

    Reducing INDEL calling errors in whole genome and exome sequencing data

    Get PDF
    BACKGROUND: INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. METHODS: We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%). RESULTS: Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. CONCLUSIONS: Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing

    DNA methylation in insects

    Get PDF
    Cytosine DNA methylation has been demonstrated in numerous eukaryotic organisms and has been shown to play an important role in human disease. The function of DNA methylation has been studied extensively in vertebrates, but establishing its primary role has proved difficult and controversial. Analysing methylation in insects has indicated an apparent functional diversity that seems to argue against a strict functional conservation. To investigate this hypothesis, we here assess the data reported in four different insect species in which DNA methylation has been analysed more thoroughly: the fruit fly Drosophila melanogaster, the cabbage moth Mamestra brassicae, the peach-potato aphid Myzus persicae and the mealybug Planococcus citri

    Arabidopsis thaliana telomeres exhibit euchromatic features

    Get PDF
    Telomere function is influenced by chromatin structure and organization, which usually involves epigenetic modifications. We describe here the chromatin structure of Arabidopsis thaliana telomeres. Based on the study of six different epigenetic marks we show that Arabidopsis telomeres exhibit euchromatic features. In contrast, subtelomeric regions and telomeric sequences present at interstitial chromosomal loci are heterochromatic. Histone methyltransferases and the chromatin remodeling protein DDM1 control subtelomeric heterochromatin formation. Whereas histone methyltransferases are required for histone H3K92Me and non-CpG DNA methylation, DDM1 directs CpG methylation but not H3K92Me or non-CpG methylation. These results argue that both kinds of proteins participate in different pathways to reinforce subtelomeric heterochromatin formation

    Characterization of Unique Small RNA Populations from Rice Grain

    Get PDF
    Small RNAs (∼20 to 24 nucleotides) function as naturally occurring molecules critical in developmental pathways in plants and animals [1], [2]. Here we analyze small RNA populations from mature rice grain and seedlings by pyrosequencing. Using a clustering algorithm to locate regions producing small RNAs, we classified hotspots of small RNA generation within the genome. Hotspots here are defined as 1 kb regions within which small RNAs are significantly overproduced relative to the rest of the genome. Hotspots were identified to facilitate characterization of different categories of small RNA regulatory elements. Included in the hotspots, we found known members of 23 miRNA families representing 92 genes, one trans acting siRNA (ta-siRNA) gene, novel siRNA-generating coding genes and phased siRNA generating genes. Interestingly, over 20% of the small RNA population in grain came from a single foldback structure, which generated eight phased 21-nt siRNAs. This is reminiscent of a newly arising miRNA derived from duplication of progenitor genes [3], [4]. Our results provide data identifying distinct populations of small RNAs, including phased small RNAs, in mature grain to facilitate characterization of small regulatory RNA expression in monocot species
    corecore